Enhanced and unenhanced: Radiomics models for discriminating between benign and malignant cystic renal masses on CT images: A multi-center study

Background Machine learning algorithms used to classify cystic renal masses (CRMs) nave not been applied to unenhanced CT images, and their diagnostic accuracy had not been compared against radiologists. Method This retrospective study aimed to develop radiomics models that discriminate between benign and malignant CRMs in a triple phase computed tomography (CT) protocol and compare the diagnostic accuracy of the radiomics approach with experienced radiologists. Predictive models were established using a training set and validation set of unenhanced and enhanced (arterial phase [AP] and venous phase [VP]) CT images of benign and malignant CRMs. The diagnostic capabilities of the models and experienced radiologists were compared using Receiver Operating Characteristic (ROC) curves. Results On unenhanced, AP and VP CT images in the validation set, the AUC, specificity, sensitivity and accuracy for discriminating between benign and malignant CRMs were 90.0 (95%CI: 81–98%), 90.0%, 90.5% and 90.2%; 93.0% (95%CI: 86–99%), 86.7%, 95.2% and 88.3%; and 95.0% (95%CI: 90%-100%), 93.3%, 90.5% and 92.1%, respectively, for the radiomics models. Diagnostic accuracy of the radiomics models differed significantly on unenhanced images in the training set vs. each radiologist (p = 0.001 and 0.003) but not in the validation set (p = 0.230 and 0.590); differed significantly on AP images in the validation set vs. each radiologist (p = 0.007 and 0.007) but not in the training set (p = 0.663 and 0.663); and there were no differences on VP images in the training or validation sets vs. each radiologist (training set: p = 0.453 and 0.051, validation set: p = 0.236 and 0.786). Conclusions Radiomics models may have clinical utility for discriminating between benign and malignant CRMs on unenhanced and enhanced CT images. The performance of the radiomics model on unenhanced CT images was similar to experienced radiologists, implying it has potential as a screening and diagnostic tool for CRMs.


Introduction
Cystic renal masses (CRMs) are lesions with less than 25% enhancing tissue that are often identified incidentally on abdominal computed tomography (CT) scans [1].The majority of CRMs are benign, but some may be renal cell carcinoma (RCC) or other rare malignant tumors of the kidney [2,3].The 2019 Bosniak classification of CRMs, updated from 2005, stratifies CRMs according to their risk of malignancy [1].In individual series, most Bosniak I and II CRMs are benign, while approximately 10-20% of Bosniak IIF, 50% of Bosniak III, and 90% of Bosniak IV CRMs are malignant [3,4].The variable malignancy risk in Bosniak IIF and III CRMs has a substantial psychological impact on patients, and necessitates follow-up, which incurs significant health care resource utilization and costs [5][6][7].
There is an unmet clinical need for an objective strategy that assists radiologists and surgeons in the identification of benign and malignant CRMs on CT images.The Bosniak classification has inherent limitations such as bias and inter-observer variability, despite clearly defined terms, imaging features, and classes [5][6][7][8].Radiologists rely on visual inspection of CT images for the diagnosis of CRMs, which can be impacted by image noise and resolution, especially on unenhanced scans [4].
Emerging research shows that radiomics features can reveal key components of tumor phenotype in three-dimensions [9], and radiomics models have excellent diagnostic efficacy for RCC on enhanced and unenhanced CT scans [10][11][12][13][14]. Several machine learning algorithms have been applied to classify CRMs as benign or malignant on contrast-enhanced CT images taken in the arterial phase (AP) and venous phase (VP) [15][16][17]; however, these algorithms were not applied to unenhanced CT images alone or validated with external datasets [15], and their diagnostic accuracy was not compared against radiologists.The objective of this multicenter study was to use a radiomics approach to discriminate between benign and malignant CRMs on all phases of a triple phase CT protocol (unenhanced, AP, VP) and compare the diagnostic accuracy of the radiomics approach with experienced radiologists.

Patients
This retrospective study was approved by the medical ethics committee of the Guangdong Provincial Hospital of Traditional Chinese Medicine (No. ZE2023-090-01), and the requirement for written informed consent was waived.
Patients with CRMs who were treated at Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou and Guangdong Provincial Hospital of Traditional Chinese Medicine, Zhuhai between January 2018 and June 2023 were eligible for this study.Inclusion criteria were: 1) � Bosniak category IIF CRMs; 2) unenhanced and enhanced CT scans (including AP and VP images); 3) complete records of patient demographic and clinical characteristics, including age, sex, location of masses, intraoperative or biopsy results and histopathological findings; and 4) CT images stored in the picture archiving and communication system (PACS) of the institution.Patients with low-quality or incomplete CT data were excluded.

CT examinations
All patients underwent unenhanced and dual-phase contrast-enhanced CT using three CT scanners (Definition flash, Siemens, Forchheim, Germany/ in Guangzhou; IQon Spectral, Philips Healthcare, Amsterdam, Netherlands/ in Guangzhou; Aquilion One 750W, Canon, Tokyo, Japan/ in Zhuhai).Imaging parameters were: tube voltage 120 kV, tube current 250 mA, section interval 5 mm, section thickness 5 mm, and matrix 512 mm × 512 mm.After conventional unenhanced imaging, 100-120 ml iopromide (Ultravist 370, Bayer Schering Pharma, Germany) was injected into the median cubital vein via a pump injector (MEDRADStellant CT, Ulrich Medical, Ulm, Germany) at 3-4 ml/s.The AP scan was triggered by aortic enhancement.The VP scan started at a delay of 60 s from the beginning of contrast injection.
One radiologist (LH) with 18 years of experience analyzed the CT images to ensure the renal masses had < 25% enhancing tissue (i.e.lesions were cystic masses), confirm the Bosniak class (Version 2019), and determine the size of the CRMs.

Mass segmentation and radiomics feature extraction
The open-source software platform, 3D-slicer version 5.2.1 (www.slicer.org) was used for mass segmentation and to calculate radiomics features.Masses were delineated on the CT images using 3D-slicer.Segmentation of whole masses was performed by associate chief radiologists (LT and HL), who have more than fifteen years of experience with abdominal radiographs.After segmentation, 855 radiomics features were extracted using the Python pyradiomics package (JetBrains PyCharm Community Edition 2017.2.4; https://www.jetbrains.com/pycharm/).To ensure the stability of radiomics features extracted from CT images, segmentation and feature extraction were repeated in 80 randomly selected patients with CRMs.The intraclass correlation coefficient (ICC) was used to evaluate consistency across radiomics features.Radiomics features with ICC > 0.75 were considered stable and were included in this analysis.

Diagnostic efficacy of radiologists
The diagnostic efficacy of radiologists was assessed in two attending radiologists (HF and XL) with more than seven years of experience in diagnostic imaging who used the open-source DICOM viewer, MicroDicom (https://www.microdicom.com/).The radiologists were from hospitals not involved in this study and were blinded to patient demographic and clinical characteristics.The radiologists independently reviewed the CT images in the following order, unenhanced, AP and VP, with a two-week interval between each phase.Diagnosis was based on the Bosniak classification and the radiologists' clinical experience.

Statistical analysis
Statistical analysis was conducted with SPSS (version 26.0, IBM, Armonk, NY, USA) and R (version 4.2.2).A two-sided P value <0.05 was considered statistically significant.
Differences in demographic and clinical characteristics were assessed using the x 2 test or independent sample t-test, as appropriate.
Radiomics models were established using a training set of 77 benign and 85 malignant CRMs from Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou.The models were tested with a validation set of 15 benign and 30 malignant CRMs from Guangdong Provincial Hospital of Traditional Chinese Medicine, Zhuhai.
Radiomics feature selection was necessary to avoid overfitting the models.Univariate analysis (Student's t-test or Mann-Whitney U test) was used to extract stable radiomics features that were statistically different between benign and malignant CRMs in the training set.Candidate radiomics features to generate the models were selected with the least absolute shrinkage and selection operator (LASSO) using the "glmnet" package in R. Bidirectional elimination was used to filter out potentially irrelevant radiomics features using the "mass" package in R. Selected features were combined into linear regression equations.
The discrimination of the models for benign and malignant CRMs was evaluated using the area under the curve (AUC) of the Receiver Operating Characteristic (ROC) curve, and the sensitivity, specificity, and accuracy of diagnosis of benign and malignant CRMs by the models and radiologists were compared.(Fig 1).

Patient characteristics
This study included 213 patients (108 males; 105 females; mean age, 58.8 ± 11.7 years) with CRMs.Of these 98 patients (54 males; 44 females; mean age, 57.8 ±14.0 years) had benign CRMs, and 115 patients (54 males; 61 females; mean age, 59.8 ±11.4 years) had malignant CRMs (Fig 2).There were no significant differences in age, sex, mass location and size between patients with benign and malignant CRMs (Table 1).All the benign CRMs were simple kidney   cysts except one case of angiomyolipoma (AML).All the malignant CRMs were clear cell carcinoma except one case of mixed epithelial and stromal tumor of the kidney (MESTK).

Diagnostic accuracy of the radiologists
On unenhanced CT images, the sensitivity, specificity and accuracy of the two radiologists for discriminating between benign and malignant CRMs were 80.0%, 94.6% and 86.5% and 84.3%, 91.3% and 87.0%, respectively.On AP CT images, the sensitivity, specificity and accuracy of the two radiologists for discriminating between benign and malignant CRMs were 95.6%, 100% and 97.1%, and 95.6%, 100% and 97.1%, respectively.On VP CT images, the sensitivity, specificity and accuracy of the two radiologists for discriminating between benign and malignant CRMs were 95.6%, 97.8% and 96.6% and 93.1%, 93.5% and 93.2%, respectively (Table 2).
The AUC, specificity, sensitivity and accuracy of the radiomics models for discriminating between benign and malignant CRMs in the training set and validation set were not significantly different (S1 Table ).
The AUC, specificity, sensitivity and accuracy for discriminating between benign and malignant CRMs differed significantly on unenhanced images in the training set vs. each radiologist (p = 0.001 and 0.003) but not in the validation set (p = 0.230 and 0.590); differed significantly on AP images in the validation set vs. each radiologist (p = 0.007 and 0.007) but not in the training set (p = 0.663 and 0.663); and there were no differences on VP images in the training or validation sets vs. each radiologist (training set: p = 0.453 and 0.051, validation set: p = 0.236 and 0.786).

Discussion
This multicenter study used a radiomics approach to discriminate between benign and malignant CRMs on all phases of a triple phase CT protocol (unenhanced, AP and VP) and compared the diagnostic accuracy of the radiomics approach to experienced radiologists.Findings showed that diagnostic performance of the radiomics models on unenhanced, AP and VP CT images was satisfactory and similar to or better than experienced radiologists.
Compared to visual inspection of CT images for the diagnosis of CRMs, a radiomics approach may provide a more comprehensive representation of the microscopic heterogeneity of the masses [27][28][29], offering an accurate representation of a lesion's pathology.The present study builds on prior research that verified the utility and stability of radiomics features for diagnosis of CRMs [10,12,15,30], applies the radiomics approach independent of the Bosniak classification, and compares the performance of radiomics models with the Bosniak classification in pathology-proven benign and malignant CRMs.Although a previous study has demonstrated that a CT texture-based machine learning algorithm has the ability to differentiate benign from malignant CRMs on contrast-enhanced abdominal CT scans [15], to the authors' knowledge, the present study is the first to apply a radiomics approach to CRM diagnosis on unenhanced CT images alone, and to compare the diagnostic performance of the radiomics models with experienced radiologists [15][16][17].
The radiomics models on unenhanced, AP and VP CT images in the validation set had relatively high sensitivity (86.7%-93.3%)and specificity (88.3%-92.1%)for distinguishing benign and malignant CRMs.The sensitivity, specificity and accuracy of the AP radiomics models in the validation set were lower than the radiologists; however, the sensitivity, specificity and accuracy of the unenhanced and VP radiomics models in the validation set were not significantly different compared to either radiologist.These data imply that the single phase radiomics models, especially the unenhanced and VP models, have important and practical clinical applications.
On unenhanced CT images, diagnostic accuracy of the two radiologists for discriminating between benign and malignant CRMs was lower compared to AP and VP images, possibly due to the poor contrast between normal and pathological tissue.The unenhanced radiomics model seemed to be less impact by the lower tissue contrast, and provided satisfactory diagnostic efficacy.These data imply that the unenhanced radiomics model has potential as a valuable diagnostic tool for CRMs in clinical and radiological practices.Most CRMs are found incidentally, for example, on chest unenhanced or annual CT examinations, or CT contrast agent may be contraindicated in patients with renal insufficiency making diagnosis of CRMs difficult.The unenhanced radiomics model may assist in screening, provide a preliminary diagnosis, and inform clinical decision-making.
This study was associated with several limitations.First, visual inspection of unenhanced, AP and VP CT images at 2-week intervals may have led to bias as radiologists may recall their initial diagnoses.Second, each center in this study used similar CT scanning systems and CT scanning parameters, so findings may not be generalizable to centers that use different systems and parameters.Third, the validation set included more malignant (n = 30) than benign (n = 21) CRMs, which may have introduced bias.Finally, our study did not apply other machine learning classifiers such as random forest, decision tree, or support vectors.These will be applied to strengthen our findings in future research.

Conclusion
In conclusion, this study developed radiomics models that have clinical utility for discriminating benign and malignant CRMs on unenhanced and enhanced CT images.The performance of the radiomics model on unenhanced CT images was similar to experienced radiologists and may have value as a potential screening and diagnostic tool for CRMs.